Garbage Collection Without Paging

Garbage Collection without Paging, Matthew Hertz, Yi Feng, Emery D. Berger. PLDI 2005

Garbage collection offers numerous software engineering advantages, but interacts poorly with virtual memory managers. Existing garbage collectors require far more pages than the application’s working set and touch pages without regard to which ones are in memory, especially during full-heap garbage collection. The resulting paging can cause throughput to plummet and pause times to spike up to seconds or even minutes.We present a garbage collector that avoids paging. This bookmarking collector cooperates with the virtual memory manager to guide its eviction decisions. Using summary information (“bookmarks”) recorded from evicted pages, the collector can perform in-memory full-heap collections. In the absence of memory pressure, the bookmarking collector matches the throughput of the best collector we tested while running in smaller heaps. In the face of memory pressure, it improves throughput by up to a factor of five and reduces pause times by up to a factor of 45 over the next best collector. Compared to a collector that consistently provides high throughput (generational mark-sweep), the bookmarking collector reduces pause times by up to 218x and improves throughput by up to 41x. Bookmarking collection thus provides greater utilization of available physical memory than other collectors while matching or exceeding their throughput.

Relationally-Parametric Polymorphic Contracts

Relationally-Parametric Polymorphic Contracts, Arjun Guha, Jacob Matthews, Robert Bruce Findler, Shriram Krishnamurthi. 2007

The analogy between types and contracts raises the question of how many features of static type systems can be expressed as dynamic contracts. An important feature missing in prior work on contracts is parametricity, as represented by the polymorphic types in languages like Standard ML.

We present a contract counterpart to parametricity. We explore multiple designs for such a system and present one that is simple and incurs minimal execution overhead. We show how to extend the notion of contract blame to our definition. We present a form of inference that can often save programmers from having to explicitly instantiate many parametric contracts. Finally, we present several examples that illustrate how this system mimics the feel and properties of parametric polymorphism in typed languages.

There's a really simple and elegant idea here: you can test at runtime whether a function uses a particular argument parametrically (ie, doesn't examine it) by wrapping it in a value that raises an error if you try to use it before sending it to the function.

JVM Languages group

Charles Nutter:

If you are interested in the future of non-Java languages on the JVM, you should be on this list. Yes, we talk about a lot of JVM lanuage implementation challenges, we discuss compilers and stack frames and call-site optimizations, but we also talk about features peripheral to language implementation like package indexing and retrofitting Java 5+ code. We need your help.

Guaranteed Optimization

"Guaranteed Optimization", in Active Libraries and Universal Languages, by Todd L. Veldhuizen. PhD thesis, 2004.

This chapter describes a new technique for designing optimizers that provide proven guarantees of what optimizations they perform. We use rewrite rules to capture common patterns of `adding abstraction' to programs, and develop a proof technique for compilers based on superanalysis (Chapter 2) to prove that any sequence of rule applications is undone in a single step. Such compilers address the problem of `abstraction penalty' --- see Section~1.4.1 --- and offers programmers an intuitive guarantee of what improvements the compiler will perform. We also show that compilers based on this approach find `optimal programs' in a carefully defined sense.

I know it's a little unusual to link to a chapter of a thesis instead of a paper, but on his web page the author said the paper I was planning on linking to ("Guaranteed Optimization: Proving Nullspace Properties of Compilers") had been superceded.

Anyway, the basic idea well illustrates the adage that a small change of POV can yield a lot of insight. Here, Veldhuizen changes the POV of a compiler writer from "optimization" to "de-pessimization", and gets a really nice result: an exact characterization of the specific kinds of inefficient patterns the compiler will eliminate. The value of this kind of predictability can be very large; functional programmers can rely on tail call optimization because we know exactly when the compiler will do the transformation.

Ralph Johnson: Erlang, the next Java

A nice blog post about Erlang. Nothing new here for those following the Erlang story, but a nice summary of several important themes none the less. Some choice quotes:

The thing that bugs me about [Armstrong's] book (and about his talks) is that he make more fuss than he should about the functional language aspect of Erlang and not enough about the OO aspect. In fact, he denies that it is OO.

Processes in Erlang are objects. At the beginning of my course on OO design, I explain three views of OO programming. The Scandinavian view is that an OO system is one whose creators realize that programming is modeling. The mystical view is that an OO system is one that is built out of objects that communicate by sending messages to each other, and computation is the messages flying from object to object. The software engineering view is that an OO system is one that supports data abstraction, polymorphism by late-binding of function calls, and inheritance. Erlang is a perfect example of the actor model, which is an example of the mystical view.

Moreover, given several kinds of processes that have a common protocol and that share some things in common, it is easy to factor out the commonality into a function that they can both call. This is similar to class inheritance. So, you could even say that Erlang supports inheritance...

Experience Report: Scheme in Commercial Web Application Development

Interesting report by Noel and his colleagues, that for some reason was announced on the PLT Scheme blog and not here...

Domain-Specific Aspect Languages

Although the majority of work in the AOSD community focuses on general-purpose aspect languages (e.g. AspectJ), seminal work on AOSD proposed a number of domain-specific aspect languages, such as COOL for concurrency management and RIDL for serialization, RG, AML, and others. A growing trend of research in the AOSD community is returning to this seminal work

Since it seems it is DSL week around here, and since Domain-Specific Aspect Languages were not discussed here before as far as I can remember, I think now may be an appropriate time to discuss this notion.

To begin the tour, head out to the web page of the first DSAL workshop: DSAL'06 which "approached domain-specific aspect languages from a language implementation point of view, where advances in the field of domain-specific language engineering were investigated to answer the implementation challenges of aspect languages," and then move over to DSAL'07 which dealt with the design and implementation of new domain-specific aspect languages in more detail.

Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams

Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams, John Whaley and Monica S. Lam. PLDI 2004.

This paper presents the first scalable context-sensitive, inclusion-based pointer alias analysis for Java programs. Our approach to context sensitivity is to create a clone of a method for every context of interest, and run a context-insensitive algorithm over the ex-panded call graph to get context-sensitive results. For precision, we generate a clone for every acyclic path through a program's call graph, treating methods in a strongly connected component as a single node. Normally, this formulation is hopelessly intractable as a call graph often has 10^14 acyclic paths or more. We show that these exponential relations can be computed efficiently using binary decision diagrams (BDDs). Key to the scalability of the technique is a context numbering scheme that exposes the commonalities across contexts. We applied our algorithm to the most popular applications available on Sourceforge, and found that the largest programs, with hundreds of thousands of Java bytecodes, can be analyzed in under 20 minutes.

This paper shows that pointer analysis, and many other queries and algorithms, can be described succinctly and declaratively using Datalog, a logic programming language. We have developed a system called bddbddb that automatically translates Datalog programs into highly efficient BDD implementations. We used this approach to develop a variety of context-sensitive algorithms including side effect analysis, type analysis, and escape analysis.

Binary decision diagrams are one of the coolest data structures of the last 20 years, and are also one of the basic enabling data structures underlying modern SAT solvers. Using them to implement Datalog is a really clever idea.

EDIT: I relied on memory instead of Google, and got it wrong about BDDs and SAT solving. Modern DPLL-based SAT solvers generally do not use BDDs, because when your solutions are far apart in the solution space the representing BDD would get quite large. BDDs are widely used in verification, though, and are also still one my favorite data structures. :)

Resources, Concurrency and Local Reasoning

Resources, Concurrency and Local Reasoning, Peter O'Hearn, 2007.

Resource has always been a central concern in concurrent programming. Often, a number of processes share access to system resources such as memory, processor time, or network bandwidth, and correct resource usage is essential for the overall working of a system. In the 1960s and 1970s Dijkstra, Hoare and Brinch Hansen attacked the problem of resource control in their basic works on concurrent programming. In addition to the use of synchronization mechanisms to provide protection from inconsistent use, they stressed the importance of resource separation as a means of controlling the complexity of process interactions and reducing the possibility of time-dependent errors. This paper revisits their ideas using the formalism of separation logic.

I have to admit that in posting this paper, I was at least slightly motivated by the imp of the perverse -- it's pretty much a commonplace among expert programmers that reasoning about programs that use shared memory, explicit mutual exclusion, and concurrency is very difficult. So the urge to post a paper showing one approach for doing it in a clean (and imo quite beautiful) way was irresistible.

Apache Camel routing rules: a DSL?

Apache Camel is a powerful rule based routing and mediation engine which provides a POJO based implementation of the Enterprise Integration Patterns using an extremely powerful fluent API (or declarative Java Domain Specific Language) to configure routing and mediation rules.

The authors probably meant embedded DSL. The examples of its use feel remarkably similar to "criteria queries" DSLs, as popularized in Java by Hibernate (and yes, many others as well).

To start a discussion - where is the boundary between an API and an embedded DSL? Is it in the eye of beholder or are there some objective differences? Some APIs have a strong feel of a DSL - remarkably various parser libraries in Haskell. Is it the quality of the host language (Haskell) or the domain (parsers)?